-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add reusable workflow to build openedx images #1
Conversation
e2b9265
to
e8aecee
Compare
This reverts commit a934cdf.
…each time" This reverts commit 1dddf54.
.github/workflows/build.yml
Outdated
with: | ||
repository: ${{ inputs.STRAIN_REPOSITORY }} | ||
ref: ${{ inputs.STRAIN_REPOSITORY_BRANCH }} | ||
token: ${{ secrets.REPO_DEPLOYMENT_KEY }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[comment to reviewers]: the default value in the ednx-strains workflow is the default github token (see here), which is unnecessary since the workflow is being used in the same repository. Instead, the token should correspond to a personal access token from a service account or an ssh key, also belonging to a service account, so the workflow can checkout other repositories: https://github.com/actions/checkout?tab=readme-ov-file#usage
Which one do you recommend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned checking out other repositories to keep backward compatibility, but if we decide to stick with this implementation, we should revisit whether there's a use case for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we expecting that the other organizations interested in using this workflow will all have their own strains repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarly. Unless teams need to change defaults from the caller workflow (e.g: https://github.com/eduNEXT/ednx-strains/blob/master/.github/workflows/build.yml#L39-L42) they shouldn't need to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need to figure out whether to configure a token or SSH key. Our current setup uses an SSH key associated with a service account. So, I'll implement this same approach in another commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the commit: a43ca13. I created another strains repo (private) to test whether the workflow checks it out correctly: https://github.com/eduNEXT/strains
@@ -0,0 +1,161 @@ | |||
name: Picasso V2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[comment to reviewers]: this is a private repository, so only repositories from the edunext organization can use the workflow, see this documentation for more details. According to our previous discussions, the plan for this workflow is also to be used by other organizations. Which means this repository should be public.
Is that feasible for this repository? Considering that the Jenkins pipeline implementation is private as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first glance, I don't see any major issues with making this repo public. I would like to hear from those who worked on the Jenkins implementation to see if this was previously discussed and to understand if there was any specific reason for keeping the implementation private. It might be simply that they didn't see a need for it to be public since Jenkins allows access to the service for as many people as needed without requiring access to the script repository (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, I should've added this context to my previous comment but I forgot. I created the dedalo-scripts repository and also committed the first changes. I didn't know at the time whether it would be a long-term solution or what implementations would be hosted there, so I used private as the default visibility setting. Now, I'd argue against this decision. The dedalo-scripts repository looks like a team-specific solution, but it shouldn't be, mainly because it's solving the tech-wide problem of how to build Open edX images.
Knowing this, this repo should be public regardless of the decisions made about the visibility of dedalo-scripts.
.github/workflows/build.yml
Outdated
inputs: | ||
STRAIN_REPOSITORY: | ||
description: 'The repository for strains to checkout. It should follow the format: ORGANIZATION/REPO' | ||
required: true | ||
type: string | ||
STRAIN_REPOSITORY_BRANCH: | ||
description: 'The branch of the repository to checkout' | ||
required: true | ||
type: string | ||
STRAIN_PATH: | ||
description: 'The path within the repository for strains' | ||
required: true | ||
type: string | ||
SERVICE: | ||
description: 'The service name to build' | ||
required: true | ||
type: string | ||
secrets: | ||
DOCKERHUB_USERNAME: | ||
description: 'DockerHub username for login' | ||
required: true | ||
DOCKERHUB_PASSWORD: | ||
description: 'DockerHub password for login' | ||
required: true | ||
REPO_DEPLOYMENT_KEY: | ||
description: 'Deployment key for repository checkout' | ||
required: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[comment to reviewers]: What other types of configurations do you find useful?
run: | | ||
. .tvm/bin/activate | ||
|
||
major_version=$(pip show tutor | grep Version | awk '{print $2}' | cut -d'.' -f1) | ||
distro_version=$( | ||
git ls-remote --tags https://github.com/eduNEXT/tutor-contrib-edunext-distro \ | ||
| grep -E "$major_version\\.[0-9]+\\.[0-9]+$" \ | ||
| tail -n 1 \ | ||
| awk -F'/' '{print $NF}' \ | ||
) | ||
|
||
pip install git+https://github.com/eduNEXT/tutor-contrib-edunext-distro@$distro_version | ||
tutor plugins enable distro |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[comment to reviewers]: I removed these lines from the original script since we're not using the plugin index to install distro:
tutor plugins update
tutor plugins list
Please let me know if you think otherwise.
shell: bash | ||
working-directory: ${{ inputs.STRAIN_PATH }}/${{ env.TUTOR_APP_NAME }} | ||
run: | | ||
. .tvm/bin/activate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[comment to reviewers]: Which environment variables does this script set? I think we can store those in the Github Environment, so there's no need to activate the tvm environment before running each step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we could find a workaround to avoid activating the environment in each step, but I think we can leave it as an improvement for another PR
echo "[worker.oci]" > buildkit.toml | ||
echo "max-parallelism = 2" >> buildkit.toml | ||
docker buildx create --use --node=max2cpu --driver=docker-container --config=./buildkit.toml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[note to author]: add a comment explaining why we're doing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this connection error with the parallel build also occur in the Jenkins pipeline? Could this potentially slow down build times or impact us in other ways?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked internally about the Jenkins agent's configuration but am still waiting for an answer so I'll ask again. I'll post here once I know more.
And yes, limiting parallelism will slow down the building process compared to the default buildkit configuration. Here's a successful build with this buildkit config, which takes about 20 minutes. I'll compare this time with our current building time for MFEs in Jenkins.
I was also thinking about taking this to the Tutor Users' Group meeting because it seems like it hasn't been addressed upstream yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
Taking this to the Tutor users group seems like a good idea. And if it's something that can be resolved in the near future, and we can help push this forward, then I don't think it should be a major concern for us
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you that this is not a blocker for the MVP. I opened this issue so we can work on it later: #2
run: | | ||
. .tvm/bin/activate | ||
tutor config save | ||
tutor images build $SERVICE --no-cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[note to author]: I don't think we need the --no-cache argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why you think we wouldn't need this? Especially considering that Picasso builds the images with the same --no-cache argument, and we want to compare the build times as fairly and accurately as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course. Our current jenkins setup uses dynamically provisioned agents, each with a max number of concurrent builds. As I understand, --no-cache
prevents one of those concurrent builds from using docker cache layers from another concurrent build. This wouldn't happen in this setup since no concurrent builds exist in the same GH runner. Also, each time a runner is provisioned, it's done from scratch without using cache of any form, including from previous builds. Therefore, this setup would behave the same with or without the --no-cache
flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Alec4r @MaferMazu, please correct me if I'm wrong. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can decide it after get some numbers and make some tests with the MVP, but I agree with you so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @mariajgrimaldi. We could make other adjustments later if we need to, but I think this is good enough for an MVP.
This seems to be functional enough to show to the other teams and get their feedback on whether this new implementation is useful and easy to adapt to their use cases.
run: | | ||
. .tvm/bin/activate | ||
tutor config save | ||
tutor images build $SERVICE --no-cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why you think we wouldn't need this? Especially considering that Picasso builds the images with the same --no-cache argument, and we want to compare the build times as fairly and accurately as possible.
echo "[worker.oci]" > buildkit.toml | ||
echo "max-parallelism = 2" >> buildkit.toml | ||
docker buildx create --use --node=max2cpu --driver=docker-container --config=./buildkit.toml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
Taking this to the Tutor users group seems like a good idea. And if it's something that can be resolved in the near future, and we can help push this forward, then I don't think it should be a major concern for us
shell: bash | ||
working-directory: ${{ inputs.STRAIN_PATH }}/${{ env.TUTOR_APP_NAME }} | ||
run: | | ||
. .tvm/bin/activate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we could find a workaround to avoid activating the environment in each step, but I think we can leave it as an improvement for another PR
.github/workflows/build.yml
Outdated
with: | ||
repository: ${{ inputs.STRAIN_REPOSITORY }} | ||
ref: ${{ inputs.STRAIN_REPOSITORY_BRANCH }} | ||
token: ${{ secrets.REPO_DEPLOYMENT_KEY }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we expecting that the other organizations interested in using this workflow will all have their own strains repo?
@@ -0,0 +1,161 @@ | |||
name: Picasso V2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first glance, I don't see any major issues with making this repo public. I would like to hear from those who worked on the Jenkins implementation to see if this was previously discussed and to understand if there was any specific reason for keeping the implementation private. It might be simply that they didn't see a need for it to be public since Jenkins allows access to the service for as many people as needed without requiring access to the script repository (?)
@magajh, thank you for your feedback! I've published this PR in our discussion post, so other teams can also review the solution. |
@mariajgrimaldi I was thinking that maybe instead of creating a workflow for building Docker images, it might be more convenient to do it as a GitHub Action. This way, each team or company could create their own workflow using the action, which would be much more flexible. Additionally, this would allow us to easily consider the future of pushing to different Docker registries. Centralizing everything in a GitHub Action would give us the flexibility to adapt to different needs in the future without having to modify multiple workflows. What do you think? Do you think we could evaluate the possibility of doing it as an action instead of a workflow? |
Hi, @mariajgrimaldi. I don't have any specific comments on the implementation, however I do have some general comments.
|
Thank you, @Alec4r, for your comments!
Thank you for the suggestion! Implementing this as a composite action was my first approach; it looked something like this, but then I changed it to a reusable action. I found debugging difficult when using it as an action. All the workflow steps happen in a single stage of the caller workflow, making reviewing the logs more difficult. Here's an example of the output when using a composite action: https://github.com/eduNEXT/ednx-strains/actions/runs/10172503969/job/28135183468. I don't think we're solving this problem by using reusable workflow instead cause there are still a bunch of logs to go through when debugging, but it's more organized since every single step is logged independently. Also, when implementing these bundles of instructions, we're assuming a couple of things, like that they run on Ubuntu. How can we ensure this is complied with when using an action? In any case, I'm not against implementing it as an action, but I'd like to understand better why that option would be better. So I opened a new issue so we can discuss it further: #4. Changing this implementation to a composite action is not much work, so we can keep the conversation open.
This is enough to use a reusable workflow: https://github.com/eduNEXT/ednx-strains/blob/master/.github/workflows/build.yml#L30-L42, the reusable workflow is called within a job unlike actions that are called as a step but the how is very similar. Therefore, we shouldn't have to modify multiple workflows just our centralized definition. |
Thank you @Squirrel18, for the feedback!
Thanks for the suggestion. I opened a new issue to discuss this further: #5
Yes! Our main goal with this workflow is that can be easily used in other repositories by calling it from any workflow, e.g.: https://github.com/eduNEXT/ednx-strains/blob/master/.github/workflows/build.yml#L30-L42
You're right. We don't need TVM since the environment is already isolated.
This would work if and only if internal means within the edunext organization only: https://docs.github.com/en/actions/creating-actions/sharing-actions-and-workflows-from-your-private-repository#about-github-actions-access-to-private-repositories. I'm not sure how we can bypass that constraint for other orgs to use it. |
Thanks for the answers, @mariajgrimaldi
I meant making it an open source project. In my opinion, there is no need to make it open source, as each vendor has its own build system and there is not much benefit in making it open source in terms of reputation. |
Hi all, @Alec4r, @Squirrel18, and @magajh. Thanks for your comments, I really appreciate them. I'll merge this tomorrow morning (EST) as the first workflow version so we can start working on all the improvements discussed here and in other channels. Thanks again. |
Description
Add reusable workflow to build Open edX images based on the Picasso V2 pipeline definition. The main goal of this implementation is to prove the feasibility of adopting Github Actions to replace Jenkins for our current build process. For more details on the team's stand on the potential migration, please visit this post.
Implementation details
For this implementation, we first translated each step in the current Picaso V2 pipeline to its counterpart in GH actions. Then, we refactored them into smaller steps when possible for easier management. Here's an overview of the translation of each stage into steps:
Jenkins pipeline stages
3.1 Set strain name for TVM use
3.2 Set tutor version for TVM use
4.1 Install TVM
4.2 Initialize TVM project
4.3 Copies strain into project folder
4.4 Generates tutor environment
5.1 Update plugin index
5.2 List plugins
5.3 Enable distro
5.4 Run syntax validation
5.5 Run distro run-extra-commands
5.6 Generates tutor environment
GitHub steps
5.1 Install TVM
5.2 Initialize TVM project
5.3 Copies strain into project folder
Other design details to take into account:
How to test
build.yml
workflow actively uses thepicasso/build.yml
workflow and sets the necessary configurations to run correctly.ednx-strains/.github/build.yml
workflow execution, go to Actions > Build Open edX strain, fill in the necessary configuration, or use the default. I suggest using theMJG/redwood-test-image
strain branch to avoid overriding the current image on dockerhub.See here a successful build: https://github.com/eduNEXT/ednx-strains/actions/runs/10208033113/job/28243841700